可以指导人们并避免各种障碍的四足动物指导机器人,有可能以相当低的成本拥有更多视力障碍的人拥有。在本文中,我们提出了一个具有基于舒适概念的新型指导机器人系统。我们设计了一个包含弹性绳索和细绳的皮带,并使用电动机调节绳子的长度以确保舒适度。我们使用基于力的人类运动模型来计划人类所经历的力量。之后,力的方向和大小分别由机器人的运动和电动机的旋转控制。这使得人类可以安全,更舒适地引导到复杂环境中的目标位置。该系统已部署在Unitree Laikago四倍平台上,并在现实情况下进行了验证。
translated by 谷歌翻译
Weakly supervised detection of anomalies in surveillance videos is a challenging task. Going beyond existing works that have deficient capabilities to localize anomalies in long videos, we propose a novel glance and focus network to effectively integrate spatial-temporal information for accurate anomaly detection. In addition, we empirically found that existing approaches that use feature magnitudes to represent the degree of anomalies typically ignore the effects of scene variations, and hence result in sub-optimal performance due to the inconsistency of feature magnitudes across scenes. To address this issue, we propose the Feature Amplification Mechanism and a Magnitude Contrastive Loss to enhance the discriminativeness of feature magnitudes for detecting anomalies. Experimental results on two large-scale benchmarks UCF-Crime and XD-Violence manifest that our method outperforms state-of-the-art approaches.
translated by 谷歌翻译
由于没有大型配对的文本形状数据,这两种方式之间的大量语义差距以及3D形状的结构复杂性,因此文本指导的3D形状生成仍然具有挑战性。本文通过引入2D图像作为垫脚石来连接两种方式并消除对配对的文本形状数据的需求,提出了一个名为“图像”的新框架,称为“垫脚石”(ISS)。我们的关键贡献是一种两阶段的功能空间对准方法,它通过利用具有多视图Supperions的预训练的单视重构造(SVR)模型来映射剪辑功能以形成形状:首先将剪辑图像剪辑剪辑功能到详细信息 - SVR模型中的丰富形状空间,然后将剪辑文本功能映射到形状空间,并通过鼓励输入文本和渲染图像之间的剪辑一致性来优化映射。此外,我们制定了一个文本制定的形状样式化模块,以用新颖的纹理打扮出输出形状。除了从文本上生成3D Shape生成的现有作品外,我们的新方法是在不需要配对的文本形状数据的情况下创建形状的一般性。实验结果表明,我们的方法在忠诚度和与文本一致性方面优于最先进的和我们的基线。此外,我们的方法可以通过逼真的和幻想结构和纹理对生成的形状进行样式化。
translated by 谷歌翻译
已被证明在改善神经电机翻译(NMT)系统方面有效的深度编码器,但是当编码器层数超过18时,它达到了翻译质量的上限。更糟糕的是,更深的网络消耗了很多内存,使其无法实现有效地训练。在本文中,我们呈现了共生网络,其包括完整的网络作为共生主网络(M-Net)和另一个具有相同结构的共享子网,但层数较少为共生子网(S-Net)。我们在变压器深度(M-N)架构上采用共生网络,并在NMT中定义M-Net和S-Net之间的特定正则化损耗$ \ mathcal {l} _ {\ tau} $。我们对共生网络进行联合培训,并旨在提高M净性能。我们拟议的培训策略在CMT'14 en-> De,De-> EN和EN-> FR任务的经典培训下将变压器深(12-6)改善了0.61,0.49和0.69 BLEU。此外,我们的变压器深(12-6)甚至优于经典变压器深度(18-6)。
translated by 谷歌翻译
最近,非自动增加(NAT)模型并行地预测输出,与自回归(AT)模型相比,实现了产生速度的大量改进。在对原始数据上表现更差的同时,大多数NAT模型都被培训为在教师模型生成的蒸馏数据上的学生模型,称为序列级知识蒸馏。提高模型性能的有效培训策略是自蒸馏混合(SDM)培训,预先训练原始数据模型,通过预先训练的模型本身产生蒸馏数据,最后重新列举模型原始数据和蒸馏数据的组合。在这项工作中,我们的目标是查看NAT模型的SDM,但发现直接采用SDM到NAT模型在翻译质量方面没有改进。通过仔细分析,我们观察失效与教师模型与NAT学生模型的建模和确认偏差相关。基于这些发现,我们提出了一种增强的策略,通过向经典SDM添加两个阶段来提高名为SDMRT的策略:一个是在自蒸馏数据上进行预重磅,另一个是对滤波后的教师蒸馏数据进行微调。我们的结果在多个NAT模型上以0.6至1.2 bleu表示基础。作为另一个奖励,对于迭代细化NAT模型,我们的方法可以在半迭代号内倾斜基线,这意味着2x加速度。
translated by 谷歌翻译
点云语义分割通常需要大型群体注释的培训数据,但清楚地,点明智的标签太乏味了。虽然最近的一些方法建议用小百分比点标签训练3D网络,但我们采取了一个极端的方法并提出“一件事点击”,这意味着注释只需要每对象标记一个点。为了利用这些极其稀疏的标签在网络培训中,我们设计了一种新颖的自我训练方法,其中我们迭代地进行培训和标签传播,通过图形传播模块促进。此外,我们采用关系网络来生成每个类别的原型,并明确地模拟图形节点之间的相似性,以产生伪标签以指导迭代培训。 Scannet-V2和S3DIS的实验结果表明,我们的自我训练方法具有极其稀疏的注释,优于大幅度的全部现有的3D语义细分的所有现有的弱监督方法,我们的结果也与完全监督的结果相媲美同行。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译